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THREE DIMENSIONAL SPATIAL PANORAMA FORMATION WITH A 

RANGE IMAGING SYSTEM 

FIELD OF THE INVENTION 

The invention relates generally to the field of panoramic imaging 
technology, and in particular to the field of forming a complete three-dimensional 
panoramic scene. 

BACKGROUND OF THE INVENTION 

Panoramic imaging technology has been used for merging multiple 
photographs or digital images to produce a single seamless 360° panoramic view 
of a particular scene. A single photographic camera is usually employed in such a 
way that a sequence of image inputs is obtained as the camera is rotated around 
the focal point of the camera lens causing every two neighboring images to 
slightly overlap each other. The intensity values from the two neighboring images 
in the overlap region are weighted and then summed to form a smooth transition. 
The resultant panorama provides a 2D (two-dimensional) description of the 
environment. 

There is a wide range of potential applications that requires not 
only intensity panorama but also panoramic three-dimensional (3D) maps 
associated with the intensity images, that is, a 3D description of the environment. 
VR technology and e-commerce are example applications where 3D panorama 
plays a crucial role. Virtual world and virtual objects can be built using the 3D 
panorama and displayed with the help of VRML (Virtual Reality Modeling 
Language); see Ames et al., VRML 2.0 Sourcebook, Second Edition, Positioning 
Shapes, Chapter 5, pp. 63-75. 

In order to obtain both intensity and 3D panorama, multiple (more 
than one) cameras are usually utilized in constructing a panoramic 3D imaging 
system. There have been systems producing depth panoramic images; see Huang 
et al., "Panoramic Stereo Imaging System with Automatic Disparity Warping and 
Seaming", Graphical Models and Image Processing, Vol. 60, No. 3, May 1998, 
pp. 196-208. Huang's system utilizes a side-by-side camera system in imitating a 
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human viewer. Another such system is described in commonly-assigned U.S. 
Patent No. 6,023,588 issued February 8, 2000 to Ray et al., and entitled "Method 
and Apparatus for Capturing Panoramic Images with Range Data". Ray's system 
displaces the camera vertically such that the line between the rear-nodal points of 
5 the cameras is aligned with the rotation axis. 

Stereo vision techniques are commonly used in multiple camera 
systems to recover spatial information of the scene. Such systems yield a 3D 
range image where the range values may not be defined at every pixel. Imaging 
systems that are capable of recovering range values at every pixel (full 3D range 
10 recovery) are known in the art. For example, Cyberware, Inc. manufactures a 
system whereby a laser is scanned across a scene. Another method described in 
k Q U.S. Patent 4,953,616 (and further described in the Sandia Lab News, vol. 46, No. 

i ^ 

19, September 16, 1994) provides a scannerless range imaging system using either 

W an amplitude-modulated high-power laser diode or an array of amplitude- 

SS 

p 15 modulated light emitting diodes (LEDs) to completely illuminate a target scene. 

' l y An improved scannerless range imaging system that is capable of yielding color 

□ intensity images in addition to the 3D range images is described in commonly- 

Yl assigned, copending U.S. Patent Application Serial No. 09/572,522, filed May 17, 

=y 2000 and entitled "Method and Apparatus for a Color Scannerless Range Imaging 

£ ■ i 
fss? 

\^ 20 System". As used herein, a scannerless range imaging system will be referred to 

as a "SRI camera" and such a system is used in producing both intensity and 3D 
panoramas. 

The SRI camera may be mounted to swivel at the nodal point at 
angular intervals and produce images; moreover, as described in commonly- 

25 assigned U.S. Patent No. 6,1 18,946, these images may be captured as image 
bundles that are used to generate intensity and 3D range images. Like the 
conventional two-dimensional panorama formed by stitching two neighboring 
intensity images together, the three-dimensional panorama is constructed by 
stitching neighboring 3D images. However, problems arise when two adjacent 3D 

30 images in a sequence are merged. The 3D values of an object point measured by 
the SRI camera system is defined with respect to the local three-dimensional 
coordinate system that is fixed relative to the camera optical system. The 
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computed 3D values of an object point in the real world space is a function of the 
orientation of the camera optical axis. 

Because of the nature of the SRI system, there is a further problem 
that must be addressed when merging two adjacent range images. The SRI system 
5 actually yields phase values that describe the phase offset for each pixel relative to 
one wavelength of the modulated illumination. These phase values are then 
converted to range values (because the modulation frequency is known). This 
leads to two types of ambiguity. First, if the objects in the scene differ in 
distances greater than one wavelength of the modulated illumination, the 

10 computed range values will reflect discontinuities where the corresponding phase 
values transitioned from one cycle to the next. This ambiguity problem can be 
solved by the method described in commonly-assigned, copending U.S. Patent 
Application Serial No. 09/449,101, which was filed November 24, 1999 in the 
names of N.D. Cahill et al. and entitled "Method for Unambiguous Range 

1 5 Detection). Even if the first type of ambiguity is resolved, a second type of 

ambiguity exists. This ambiguity arises because the phase values returned by the 
SRI system do not contain any information about absolute distance to the camera. 
The information captured by the SRI system is only sufficient to generate relative 
range values, not absolute range values. Therefore, the absolute range values 

20 differ by the values computed and returned by the SRI system in the range images 
by some unknown constant. In general, the unknown constant for a given range 
image is not the same as the unknown constant for another range image. This 
presents a problem when attempting to merge/stitch two adjacent range images 
captured from the SRI system. If the unknown constants are not the same, it will 

25 be impossible to continuously merge the two images. 

Therefore, two problems emerge. The first problem is that the 
computed 3D values in a given image are not absolutely known; they are only 
known relative to the other objects in the same image. Thus, an unknown constant 
offset must be added to every 3D value in the image. However, the constant 

30 offsets in subsequent 3D images may be different, and the difference in offsets 
must be determined in order to correctly merge the 3D values from neighboring 
scenes. Even if the first problem is solved, the 3D values of an object point in 



subsequent images are still dependent on orientation of the camera optical axis for 
each image. Consequently, distortion appears when a sequence of 3D images is 
used to describe the shape of an object. For instance, a smooth surface object in 
the three-dimensional space appears as a fragmented smooth surface object after 
reconstruction, using the untreated 3D images. Three methods have been shown 
to address the second problem in panoramic 3D map formation. Each method 
comprises transforming 3D values into some reference coordinate system. As 
described in commonly assigned, copending U.S. Patent Application Serial No. 
09/383,573, filed August 25, 1999 in the names of Nathan D. Cahill and Shoupu 
Chen, and entitled "Method For Creating Environment Map Containing 
Information Extracted From Stereo Image Pairs", a directional transformation 
transforms 3D values by projecting points orthographically into a reference plane. 
As also described in Serial No. 09/383,573, a perspective transformation 
transforms 3D values by projecting points to the common nodal axis. As 
described in commonly assigned, copending U.S. Patent Application Serial No. 
09/686,610, filed 1 1 October 2000 in the names of Lawrence A. Ray and Shoupu 
Chen, and entitled "Method for Three Dimensional Spatial Panorama Formation", 
an (X,Y,Z,) transformation transforms 3D values into 3-element vectors 
describing orthographic range to a reference system . 

Even though all of these approaches eliminate the problem of 
individual range images being defined in different coordinate systems, they are 
useless in the SRI camera system unless the difference in constant range offsets 
between subsequent images is determined. 

SUMMARY OF THE INVENTION 

It is an object of the invention to provide a range imaging system 
capable of generating 3D spatial panoramas. 

It is a further object of this invention to provide a method whereby 
the difference between the unknown constants of adjacent range images is 
determined, and that difference is used to merge/stitch adjacent range images in a 
continuous manner. 



# • 

5 



The present invention is directed to overcoming one or more of the 
problems set forth above. Briefly summarized, according to one aspect of the 
present invention, a method for deriving a three-dimensional panorama from a 
plurality of images of a scene generated by a range imaging camera of the type 
5 that produces ambiguities in range information includes the steps of: (a) acquiring 
a plurality of adjacent images of the scene, wherein there is an overlap region 
between the adjacent images and at least some of the adjacent images are range 
images; (b) providing offset data for the range images in order to recover 
corrected relative scene spatial information and provide a corrected range image, 

10 and (c) deriving a three-dimensional panorama from the corrected range image. 
In order to provide offset data, a relative range difference is detected between 
adjacent range images as a constant offset between the adjacent images; and the 
constant offset is applied to at least one of adjacent range images to correct for 
ambiguities in the relative ranges of the range images. 

15 The invention further includes a method, a system, and a computer 

program product for deriving a three-dimensional panorama from a plurality of 
images of a scene generated from a SRI camera that generates 3D range values for 
the images with respect to a local three-dimensional coordinate system wherein 
the image is captured. The invention involves acquiring a plurality of images of 

20 the scene by rotating the camera about a Y-axis (vertical axis); determining the 
difference in constant offsets for the relative 3D range values of subsequent 
images; generating (X,Y,Z) values in local three-dimensional coordinate systems 
for each 3D range image; selecting a reference three-dimensional world 
coordinate system against which the overall spatial information of the scene can 

25 be correctly presented; transforming the generated (X,Y,Z) values from each of 
the local three-dimensional coordinate systems to the selected reference three- 
dimensional world coordinate system; warping the transformed (X,Y,Z) images 
to correct for geometric distortion caused by the perspective projection, and 
forming a plurality of warped (X,Y,Z) images; registering adjacent warped 

30 (X,Y,Z) images; and forming a three-dimensional panorama, i.e., a (X,Y,Z) 
panorama, using the warped (X,Y,Z) images. 
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The advantage of the invention is that it allows for merging two 
adjacent range images composed of relative range values, that is, where the range 
information returned by the camera does not contain any information about 
absolute distance to the camera. Instead, the relative range information is 
incremented by a constant determined according to the invention and the merging 
of the adjacent images incorporates the determined constant. 

These and other aspects, objects, features and advantages of the 
present invention will be more clearly understood and appreciated from a review 
of the following detailed description of the preferred embodiments and appended 
claims, and by reference to the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 illustrates the method steps involved in the formation of a 
three-dimensional panorama with color texture mapped for a graphics display in 
accordance with the invention; 

FIG. 2 depicts an image bundle used in connection with the method 

of Figure 1; 

FIGS. 3 A and 3B illustrate the relationship between the image 
plane and a cylindrically warped image plane; 

FIG. 4 shows the registration of adjacent images; 

FIGS. 5 A and 5B show the relationship between range values in 
adjacent images; 

FIG. 6 illustrates the method steps involved in the determination of 
relative range differences between adjacent images; 

FIG. 7 illustrates the transformation of the corrected 3D data from 
each of the individual three-dimensional coordinate systems determined by the 
camera orientation to a common reference three-dimensional coordinate system; 
and 

FIG. 8 shows a block diagram of a three-dimensional panoramic 
imaging system according to the present invention. 

FIG. 9 is a block diagram of a known SRI camera which can be 
used to capture a bundle of images. 



FIG. 10 is an illustration of the use of an SRI camera in a 
panoramic imaging application. 

DETAILED DESCRIPTION OF THE INVENTION 

Because panoramic methods and imaging technology are well 
known, the present description will be directed in particular to elements forming 
part of, or cooperating more directly with, apparatus in accordance with the 
present invention. Elements not specifically shown or described herein may be 
selected from those known in the art. Certain aspects of the embodiments to be 
described may be provided in software. Given the system as shown and described 
according to the invention in the following materials, software not specifically 
shown, described or suggested herein that is useful for implementation of the 
invention is conventional and within the ordinary skill in such arts. 

It is helpful to first review the principles and techniques involved 
in scannerless range imaging. Accordingly, referring first to Figure 9, an SRI 
camera 10 is shown as a laser radar that is used to illuminate a scene 12 and then 
to capture an image bundle comprising a minimum of three images of the scene 
12. An illuminator 14 emits a beam of electromagnetic radiation whose frequency 
is controlled by a modulator 16. Typically, in the prior art, the illuminator 14 is a 
laser device which includes an optical diffuser in order to effect a wide-field 
illumination. The modulator 16 provides an amplitude varying sinusoidal 
modulation. The modulated illumination source is modeled by: 

L (t) = Ml +*7sin(2;ck) (Eq. 1) 

where /u L is the mean illumination, tj is the modulus of the illumination source, 
and X is the modulation frequency applied to the illuminator 14. The modulation 
frequency is sufficiently high (e.g., 12.5 MHz) to attain sufficiently accurate range 
estimates. The output beam 1 8 is directed toward the scene 12 and a reflected 
beam 20 is directed back toward a receiving section 22. As is well known, the 
reflected beam 20 is a delayed version of the transmitted output beam 1 8, with the 
amount of phase delay being a function of the distance of the scene 12 from the 
range imaging system. The reflected beam 20 strikes a photocathode 24 within an 
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image intensifier 26, thereby producing a modulated electron stream proportional 
to the input amplitude variations. The output of the image intensifier 26 is 
modeled by: 

M(t) = jx M +y sin(2**/) (Eq. 2) 

where ju M is the mean intensification, y is the modulus of the intensification and 
X is the modulation frequency applied to the intensifier 26. The purpose of the 
image intensifier is not only to intensify the image, but also to act as a frequency 
mixer and shutter. Accordingly, the image intensifier 26 is connected to the 
modulator 16, causing the gain of a microchannel plate 30 to modulate. The 
electron stream from the photocathode 24 strikes the microchannel plate 30 and is 
mixed with a modulating signal from the modulator 16. The modulated electron 
stream is amplified through secondary emission by the microchannel plate 30. 
The intensified electron stream bombards a phosphor screen 32, which converts 
the energy into a visible light image. The intensified light image signal is 
captured by a capture mechanism 34, such as a charge-coupled device (CCD). 
The captured image signal is applied to a range processor 36 to determine the 
phase delay at each point in the scene. The phase delay term co of an object at a 
range p meters is given by: 

^ = ^lrnod2^ (Eq. 3) 

c 

where c is the velocity of light in a vacuum. Consequently, the reflected light at 
this point is modeled by: 

R (?) = Ml + ^sin( InXt + co) (Eq. 4) 

where k is the modulus of illumination reflected from the object. The pixel 
response P at this point is an integration of the reflected light and the effect of the 
intensification: 

P = (** R(t)M(t)dt = 2ju l ju m + KTiy cos(a>) (Eq. 5) 

Jo 

In the range imaging system disclosed in the aforementioned U.S. 
Patent No. 4,935,616, which is incorporated herein by reference, a reference 
image is captured during which time the micro-channel plate is not modulated, but 
rather kept at a mean response. The range is estimated for each pixel by 
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recovering the phase term as a function of the value of the pixel in the reference 
image and the phase image. 

A preferred, more robust approach for recovering the phase term is 
described in U.S. Patent No. 6,1 18,946, entitled "Method and Apparatus for 
5 Scannerless Range Image Capture Using Photographic Film", which is 

incorporated herein by reference. Instead of collecting a phase image and a 
reference image, this approach collects at least three phase images (referred to as 
an image bundle). This approach shifts the phase of the intensifier 26 relative to 
the phase of the illuminator 14, and each of the phase images has a distinct phase 
10 offset. For this purpose, the range processor 36 is suitably connected to control 
the phase offset of the modulator 1 6, as well as the average illumination level and 
such other capture functions as may be necessary. If the image intensifier 26 (or 
laser illuminator 14) is phase shifted by 6 t , the pixel response from equation (5) 
becomes: 

15 P. = 2ju L ii M 7r + K7rycos(co + 0 i ) (Eq. 6) 

It is desired to extract the phase term a* from the expression. 
However, this term is not directly accessible from a single image. In equation (6) 
there are three unknown values and the form of the equation is quite simple. As a 
result, mathematically only three samples (from three images) are required to 

20 retrieve an estimate of the phase term, which is proportional to the distance of an 
object in the scene from the imaging system. Therefore, a set of three images 
captured with unique phase shifts is sufficient to determine co . For simplicity, the 
phase shifts are given by 0 k = 2^/3; k - 0,1,2. In the following description, an 
image bundle shall be understood to constitute a collection of images which are of 

25 the same scene, but with each image having a distinct phase offset obtained from 
the modulation applied to the intensifier 26. It should also be understood that an 
analogous analysis can be performed by phase shifting the illuminator 14 instead 
of the intensifier 26. If an image bundle comprising more than three images is 
captured, then the estimates of range can be enhanced by a least squares analysis 

30 using a singular value decomposition (see, e.g., W.H. Press, B.P. Flannery, S.A. 
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Teukolsky and W.T. Vetterling, Numerical Recipes (the Art of Scientific 
Computing ), Cambridge University Press, Cambridge, 1986). 

If images are captured with n >3 distinct phase offsets of the 
intensifier (or laser or a combination of both) these images form an image bundle. 
Applying Equation (6) to each image in the image bundle and expanding the 
cosine term (i.e., P. = 2fi L ju M n + /cr^(cos(6>)cos(^.) - sin(<2?)sin(0,)) ) results in 
the following system of linear equations in n unknowns at each point: 
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where A = 2fi L /j, M n , A 2 — Kny cos co , and A 3 = Kny sin co . This system of 
equations is solved by a singular value decomposition to yield the vector 
A = [A, , A 2 , A 3 ] r . Since this calculation is carried out at every (x ? y) location in 
the image bundle, A is really a vector image containing a three element vector at 
every point. The phase term co is computed at each point using a four-quadrant 
arctangent calculation: 

co = tan 1 (A 3 , A 2 ) (Eq. 8) 

The resulting collection of phase values at each point forms the phase image. 
Once phase has been determined, range r can be calculated by: 



r = co^- (Bq.9) 



Equations (l)-(9) thus describe a method of estimating range using an image 
bundle with at least three images (i.e., n=3) corresponding to distinct phase offsets 
of the intensifier and/or illuminator. 

Figure 10 shows the use of an SRI camera 10 in a panoramic 
imaging application. A single SRI camera 10 is mounted to pivot about a Y-axis 
(vertical axis) 50 through a number of capture positions 54, each separated by an 
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angle 0 that provides an overlapping field of view between neighboring positions. 
The Y-axis 50 is arranged to be coincident with a focal point 50 of the lens 52 of 
the SRI camera 10. In this manner, a sequence of image inputs are obtained as the 
SRI camera 10 is rotated around the focal point 50 of the camera lens 52, causing 
each successive image to slightly overlap its neighboring image. Since each input 
corresponds to an image bundle, a plurality of image bundles of the scene are 
acquired by rotating the SRI camera 10 about its Y-axis (vertical axis) 50, wherein 
there is an overlap region between adjacent image bundles. Although an SRI 
(scannerless range imaging) camera is used in the preferred embodiment, it should 
be understood that the invention may be used in connection with other types of 
range imaging systems, such as scanned systems, and the claims, unless 
specifically directed to SRI systems, are intended to read without limitation on 
any kind of range imaging system. Similarly, although the collection of image 
bundles is the preferred embodiment, it should be understood that the invention is 
not limited to any specific image collection. Moreover, there may be applications, 
e.g., in creating virtual images of small objects, where the SRI camera may be 
stationary and the "scene" may be rotated, e.g., on a turntable, in order to obtain 
overlapping images. 

Referring now to Figure 1, a image processing method 100 is 
shown according to the invention for deriving a three-dimensional panorama from 
a plurality of image bundles of a scene generated from an SRI camera, including 
the steps of acquiring an image bundle 102 with the SRI camera, determining a 
range 104, capturing all image bundles 106, moving the SRI camera to an adjacent 
position 108, determining a warp function and a registration point 110, 
determining relative range differences 112, applying the relative range differences 
1 14, selecting a world coordinate system 116, transforming each range image to 
the world coordinate system 118, warping the images 120, registering the adjacent 
warped images 122, and forming a 3D panoramic image 124. 

The image processing method 100 forms a complete three- 
dimensional scene panorama for virtual reality visualization. The method 1 00 
uses an image bundle 102 to generate a corresponding spatial image, e.g. an 
(X,Y,Z) image, in step 104. An inquiry of whether all image bundles have been 
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captured is performed 106. A negative response to the inquiry causes the SRI 
camera to move to an adjacent position 108. A warping function and registration 
point is computed 110 and used to determine the differences in constant offsets of 
the relative 3D range values between image bundles captured from adjacent 
5 positions 112. Once these differences have been determined, they are applied to 
the spatial images 114. An arbitrary reference three-dimensional world coordinate 
system is established in step 1 16 to uniquely describe the spatial property of the 
scene captured. All the estimated spatial images are transformed in step 1 18 to 
the reference three-dimensional world coordinate system with a homogeneous 
10 transformation matrix that is constructed based on the information of the capturing 
device. The transformed spatial images are stitched together to form a spatial 

f3 

,H panorama after a cylindrical warping procedure 120 and a registration process 

?*f 122. Likewise, the intensity images are stitched together to form an intensity 

y panorama in step 124 after the same procedures. Both spatial and intensity 

j^j 15 panoramas are used in a virtual display with no further transformation operation 

" % ^ needed. 

□ The notion of an image bundle is an important aspect of a preferred 

J* J range estimation method using an SRI camera. As shown in relation to Figure 2, 

an image bundle 200 includes a combination of images captured by the SRI 

20 system as well as information pertinent to the individual images and information 
common to all the images. The image bundle contains two types of images: range 
images 202 related to the image capture portion of the SRI process and an 
intensity image 204, which may be a color image. Common information 206 in 
the image bundle 200 would typically include the number of range images in the 

25 bundle (three or more) and the modulation frequency used by the SRI system. 
Other information might be the number of horizontal and vertical pixels in the 
images, and/or data related to camera status at the time of the image capture. 
Image specific information will include the phase offset 1 . . .N used for each 
(1 . . .N) of the individual range images 202. The image bundle 200 includes a 

30 minimum of three such images, each of which are monochrome. The additional 
intensity image 204 is an image using an optical channel of the SRI camera that 
does not contain range capture components. For example, as disclosed in the 
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aforementioned Serial No. 09/572,522, which is incorporated herein by reference, 
a beamsplitter is used to establish two optical paths: one path contains the range 
imaging elements and the other path contains regular optics for transmitting the 
intensity (e.g., color) image. An optical network (including light control means 
such as a shutter) recombines the image paths toward a single image responsive 
element, and a range image and a intensity image are separately, and sequentially, 
captured. Alternatively, the range imaging elements and regular optics may be 
interchanged in a single optical path. Although the intensity image may be a color 
image, it is preferably, but not necessarily, the same size as the range images 202. 

Once an image bundle has been acquired, it is used to determine 
3D range 104. Referring to Figures 3 A and 3B, because images captured from a 
camera 300 are projections of objects through the nodal point of the taking lens 
into a plane, an inherent distortion exists. Objects near the center of an image 
appear smaller than objects near the edges; this distortion is evident in regions of 
overlap between images 302. In order to create a spatial panorama containing no 
distortion, the intensity images and 3D range images must eventually be warped. 

One such warp that corrects for the distortion (but not the only such 
warp) is a cylindrical warp 110, where the images are warped onto a cylinder 304 
about the vertical axis of the cylinder. This warping technique is described in 
detail in the aforementioned copending U.S. Patent Application Serial No. 
09/383,573, "Method For Creating Environment Map Containing Information 
Extracted From Stereo Image Pairs", which is incorporated herein by reference. 
Briefly described, the warp can be described by a function W(x py y p ) that maps 
pixel 324 (x py y p ) in the image plane 318 to pixel 312 (x c ^y c ) in the warped plane 
310. The cylindrical warping function W(x p ,y p ) can be determined in the 
following manner; suppose the real world point 306 is projected through the rear 
nodal point 308 of the taking lens onto the cylinder 304 at point 3 12 (x c ,y c ), where 
x c is the horizontal pixel coordinate 314 andy c is the vertical pixel coordinate 316 
(relative to the orthogonal projection of the nodal point 308 onto the image plane 
318). The intensity/range value assigned to the cylindrically warped image at 
point 312 (x C: y c ) should be the intensity /range value found at point 324 (x p ,y p ) in 
the planar image 318, where x p is the horizontal pixel coordinate 320 and>^ is the 




!*"! 

hi 
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vertical pixel coordinate 322 of point 324. It can be shown that (x p ,y p ) can be 
computed in the following way: 

„ _ta*(x c Px'f) 



y c ttm(x c p x if) 



(Eq. 10) 



y c > *c = <> 

5 where p x is the length of pixels of the image plane 3 1 8 in the x-direction and/is 
the focal length of the taking lens. In general, (x py y p ) will not be integer valued, so 
it is appropriate to interpolate nearby intensity values. For range values it is only 
appropriate to assign the value of the pixel nearest (x pj y p ). 

Referring to Figure 4, the registration point 400 of adjacent images 
In 10 402 and 404 warped by the warping function FT must be found Any of a variety 

!^ of image alignment techniques can be used to register the warped left 402 and 

i —J 

1 aJ right 404 images; e.g., see C. D. Kuglin and D. C Hines, "The Phase Correlation 

*g Image Alignment Method", Proc. 1975 International Conference on Cybernetics 

* ^ and Society, pp. 163-165, 1975, which is incorporated herein by reference. 

1 5 Although the adjacent warped intensity images and the adjacent warped range 
images may be separately registered, since the coordinates of the range images 
correspond exactly to those of the intensity images, common values in the 
adjacent warped intensity images are registered and then registration points in the 
warped range images are defined to be in the same locations as those used for 
20 registering the intensity images. The output of the image alignment method yields 
the overlap region 406 between the left 402 and right 404 warped intensity 
images. The registration point 400 is taken to be the upper left hand comer of the 
overlap region 406; but, in general, can be defined as any point which identifies 
the location of the overlap region. 
25 In order to determine the difference in constant range offsets 

between subsequent images, we employ an optimization procedure. Referring to 
Figures 5A and 5B, we compare adjacent 3D range images 500 (left) and 502 
(right). If we consider that pixel (x\,yi) 504 in the left image 500 and pixel {xj^yi) 
506 in the right image contain relative range values (say d\ and ^2, respectively) to 
30 the same real world point 306, then we know by definition that d\ is measured 
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orthogonally to the left image plane 500, and that d% is measured orthogonally to 
the right image plane 502. In addition, one caveat of the SRI cameras is that 3D 
range values are known only in relation to one another, not as absolute distances 
to the image plane. Therefore, it is necessary to recover corrected relative scene 
spatial information and, consequently, there is an unknown constant offset that 
must be accounted for in order to correctly compare the 3D range values in the left 
500 and right 502 images. 

Figure 5B illustrates the difference in the local coordinate systems 
that describe the range values in the left 500 and right 502 images. Looking down 
the nodal axis of the SRI camera, 508 is parallel to the x-axis of the left image 
500, and 5 1 0 is parallel to the jc-axis of the right image 502. The real world point 
306 projected into both images is found to have a 3D range value 512 of d\ in the 
left image 500 and a 3D range value 514 of ^fe in the right image 502. The angle 
518 between the image planes is known a priori , and is denoted ft If the 3D 
range values d\ and are known absolutely, then it can easily be shown that: 

d 2 =^(/?sin<9 + cos6>), (Eq. 12) 

where/is the focal length of the SRI camera and f3 516 is the horizontal distance 
from the center of the image to the pixel containing the projection of 306. Since 
the 3D range values d\ and are not known absolutely, the relationship between 
d\ and J 2 becomes: 

d 2 = dl * a (/7sinfl + cos6>), (Eq. 13) 

where a is the unknown constant offset between the relative 3D range values. 

Figure 6 describes the process 112 (referred to in Figure 1), 
whereby an estimate for a is determined. In 600, an initial estimate for a is 
chosen; e.g., a = 0. In 602, the right hand side of Equation 13 is evaluated, 
yielding d 2 , an estimate of the 3D range value in the right image. In 604, the 3D 
range images are warped according to the warp function W, and then they are 
registered using the pre-determined registration point 400. The error between the 
predicted d 2 values and the actual c/2 values in the overlap region 406 of the 
warped registered images are computed 606 by calculating the difference d 2 - d% at 
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each pixel, squaring this difference, and then summing the squared difference 
values for all overlapping pixels. An inquiry is made 608 as to whether the errors 
(measured by the summed squared difference values) is acceptable. If the result 
of the inquiry is negative, a new estimate for a is chosen according to some 
5 optimization scheme 610 (e.g., Newton's method, line search, etc., see Fletcher, 
Practical Methods of Optimization , 2 nd Edition, John Wiley & Sons, 1987). A 
good choice is the Levenberg-Marquardt optimization scheme, which is described 
in the aforementioned Fletcher reference (pages 100-107). When the result of the 
inquiry 608 is finally affirmative, the current estimate for a is chosen to be the 
10 relative range difference between the two images. According to 1 14 (referring to 
Figure 1), the relative range difference a is added to each 3D range value in the 
i£J left image 500. Note that once the relative range difference has been applied, the 

^ range values in the left 500 and right 502 images will not be absolute; rather, they 

;ff ! will still be relative, but with consistent constant offsets. 

p 15 Once the relative range differences have been applied to all of the 

fli 

; ~ 3D range images, the resulting corrected 3D values are used to form spatial 

P images (X,Y,Z) for the scene. It should be noted that the resulting spatial images 

\m are valid for a local three-dimensional coordinate system only. That is, for image 

\ 2 500, the (X,Y,Z) values are given with respect to local three-dimensional 

H 20 coordinate system 1 XY 1 Z; for image 502, the 3D values are given with respect to 

local three-dimensional coordinate system 2 XY 2 Z. If a panoramic image sequence 
is composed with N pairs of images, there will be TV different three-dimensional 
coordinate systems with respect to which the (X, Y,Z) values are computed. 

Figure 7 includes an example coordinate system transformation 
25 700, a rotation direction 702, a reference coordinate system 704, a pre- 

transformation (A) coordinate system 706, a pre-transformation (B) coordinate 
system 708, an angle 710 between coordinate systems 706 and 704, an angle 712 
between coordinate systems 708 and 704. 

Figure 8 shows a three-dimensional panoramic imaging system 
30 800, including a panoramic 3D capturing system 802, a reference coordinate 
system 804, an image stitching system 806, a graphics display system 808, a 
plurality of intensity (R,G,B) images 810, a plurality of spatial (X,Y,Z) images, a 
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sequence of transformed images 814, a stitched spatial panorama 816, and a 
stitched intensity panorama 818. 

In operation, the three-dimensional panoramic imaging system 800 
enables the 3D panoramic capturing system 802 to produce a sequence of three- 
dimensional (X,YyZ) images 812 as well as a sequence of (R,G,B) images 810. In 
accordance with the present invention, each of the (X, Y,Z) images generated from 
the captured sequence is transformed to a common three-dimensional coordinate 
system 804 from its local three-dimensional coordinate system at which the 
corresponding (R,G,B) image is taken and the original (X,Y,Z) image is 
computed. The transformed (X,Y,Z) images in a sequence are stitched together in 
the image stitching system 806 producing a stitched (X,Y,Z) panorama 816. The 
intensity (R,G,B) images are stitched together in the image stitching system 806 
producing a (R,G,B) panorama 818. The stitched (X,Y,Z) panorama 816 and 
(R,G,B) panorama 818 are fed to a graphics display system 808 to generate a 
virtual world. 

In accordance with the present invention, a common reference 
three-dimensional coordinate system (i.e. a world coordinate system) is arbitrarily 
selected, all the (X,Y,Z) values computed for all the image pairs are transformed 
from their original local three-dimensional coordinate system to the selected world 
coordinate system. As an example, referring to Fig. 7, the coordinate system 
2 XY 2 Z 704 is chosen as the world coordinate system, all data computed are then 
transformed to the world coordinate system with the method described below. 

For example, denote a three-dimensional point in local coordinate 

system j by 



J P=i J X p9 'Y p9 >Z p9 l] (Eq. 14) 



then the homogeneous transformation from local coordinate system j to world 
coordinate system i can be represented by 



'P = [Tj] J P 



(Eq. 15) 
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For cases as shown in Fig. 7 where only rotation about the Faxis exists, then the 
transformation matrix 



T) 



cos(0,,.) 0 -sin(0,,) 0 

0 10 0 

sin(0„.) 0 cos(0 ;/ ) 0 

0 0 0 1 



(Eq. 16) 



where 0 J{ is the rotation angle from local coordinate system j to world coordinate 
system i about the Y axis. For a more general homogenous transformation matrix 



t; 







'.3 


'l4 






*23 


*24 




^32 


'33 


*34 


0 


0 


0 


1 



(Eq. 17) 



where 



t u =cos(0,,.)cos()r,,.) 

t l2 = sin(G> y ,.)sin(^,)cos(^) +co(a) jl )sm(fc ji ) 
t l3 = -cos(^)sin(^.Jcos(^.J + sin(« y .)sin(/c- y .) 
t 2l =-cos(^ i )sin(«r y7 ) 

t 22 = -sin(<y ) sin(0 y7 ) sin(x: ;i . ) + cos(« y ,. ) cos(*r, 7 ) (Eq. 1 8) 

t 23 = cos(o 7 ,.)sin(^,.)sin(^ ;/ ) + sin(a? ;f )cos(^,) 

* 31 =sin(0 ; ,.) 

'32 = -sin(co Ji )cos(0 Ji ) 

t 33 = cos(a>,,.)cos(0 ;/ ) 

= y„ 

'34 = z ji 

where 9 }l is the rotation angle from local coordinate system j to world coordinate 
system 1 about the Y axis, co n is the rotation angle about the X axis, K Jt is the 
angle about Z axis, x fi is the translation between local coordinate system j and 
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world coordinate system i along Xaxis, y }i is the translation along Y axis, and z 7 

is the translation along Z axis. 

It should be pointed out that all coordinate systems are defined by 
the right-hand rule (as defined in Stewart, Calculus, 2 nd Edition, Brooks/Cole, 
1991, p. 639). Rotation angles go, 0, and k are defined positive if they are 
counterclockwise when viewed from the positive end of their respective axes. 
Positive rotation angle 0 for example, is shown in Fig. 7 as 032 710 for 
transforming coordinate system 3 XY*Z 706 to coordinate system 2 XY 2 Z 704. 
While 0i2 712 is negative since the transformation is made from coordinate 
system 1 XY l Z 708 to coordinate system 1 XY l Z 704, which presents a clockwise 
rotation. Arrow 702 indicates the counterclockwise rotation. 

After applying the above example homogenous transformation to 
each of the (X,Y,Z) images 812 generated from the panoramic 3D capturing 
system 802, a sequence of transformed (X,Y,Z) images 814 from each of the local 
three-dimensional coordinate systems to the selected reference three-dimensional 
world coordinate system is produced. The sequence of transformed (X,Y,Z) 
images ready is stitched together in image stitch block 806 where the sequence of 
(R,G,B) images is also stitched. Since images are a perspective projection of real 
world objects onto a plane, an inherent distortion exists. In order to remove this 
distortion and keep sizes of objects consistent between the inter-pair images, the 
(R,G,B) and corresponding transformed (X,Y,Z) images must be first warped 
from a planar surface to another domain such as a cylindrical surface. Thus a 
plurality of warped images may be formed. The predetermined warp function W 
can be used. Then, the pre-identified registration points of adjacent sets of 
overlapping cylindrically warped (R,G,B) images are used to stitch together the 
cylindrically warped (R,G,B) images to form a (R,G,B) panorama 818. Likewise, 
adjacent sets (inter-pair) of overlapping cylindrically warped (X,Y,Z) images can 
be stitched together to form a (X,Y,Z) panorama 816. Both (R,G,B) and (X,Y,Z) 
panoramas are then input to the graphics display system 808, such as the 
aforementioned VRML system, for visualization. 
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The present invention is preferably practiced in an image 
processing system including a source of digital images, such as a scanner; a 
computer programmed to process digital images; and an output device such as a 
graphics display device, a thermal printer, or an inkjet printer. The method of the 
present invention may be sold as a computer program product including a 
computer readable storage medium bearing computer code for implementing the 
steps of the invention. Computer readable storage medium may include, for 
example: magnetic storage media such as a magnetic disc (e.g. a hard disk or a 
floppy disc) or magnetic tape; optical storage media such as optical disc or optical 
tape; bar code; solid state electronic storage devices such as random access 
memory (RAM) or read only memory (ROM); or any other physical device or 
medium employed to store a computer program. 

The invention has been described with reference to a preferred 
embodiment. However, it will be appreciated that variations and modifications 
can be effected by a person of ordinary skill in the art without departing from the 
scope of the invention. 




21 

PARTS LIST 
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706 pre-transformation coordinate system (A) 

708 pre-transformation cordinate system (B) 

7 1 0 angle between 706 and 704 

7 1 2 angle between 708 and 704 

800 three-dimensional panoramic imaging system 

802 3D capturing and XYZ image generation step 

804 3D reference coordinate system 

806 image stitching system 

808 graphics display system 

810 intensity (RGB) image sequence 

812 spatial (XYZ) image sequence 

814 transformed spatial (XYZ) image sequence 

□ 816 stitched spatial (XYZ) panorama 

J? 818 stitched intensity (RGB) panorama 
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